An efficient algorithm for mining high utility itemsets with negative item values in large databases

نویسندگان

  • Chun-Jung Chu
  • Vincent S. Tseng
  • Tyne Liang
چکیده

Utility itemsets typically consist of items with different values such as utilities, and the aim of utility mining is to identify the itemsets with highest utilities. In the past studies on utility mining, the values of utility itemsets were considered as positive. In some applications, however, an itemset may be associated with negative item values. Hence, discovery of high utility itemsets with negative item values is important for mining interesting patterns like association rules. In this paper, we propose a novel method, namely HUINIV (High Utility Itemsets with Negative Item Values)-Mine, for efficiently and effectively mining high utility itemsets from large databases with consideration of negative item values. To the best of our knowledge, this is the first work that considers the concept of negative item values in utility mining. The novel contribution of HUINIV-Mine is that it can effectively identify high utility itemsets by generating fewer high transaction-weighted utilization itemsets such that the execution time can be reduced substantially in mining the high utility item-sets. In this way, the process of discovering all high utility itemsets with consideration of negative item values can be accomplished effectively with less requirements on memory space and CPU I/O. This meets the critical requirements of temporal and spatial efficiency for mining high utility itemsets with negative item values. Through experimental evaluation , it is shown that HUINIV-Mine outperforms other methods substantially by generating much less candidate itemsets under different experimental conditions. Mining of association rules in large databases is a well studied technique in the field of data mining with typical methods like Apriori [1,2]. The problem surrounding association rules mining can be decomposed into two steps. The first step involves finding all frequent itemsets (or large itemsets) in a database. Once the frequent itemsets are found, generating association rules is straightforward and can be accomplished in linear time. Most methods in finding frequent itemsets are designed for traditional databases. However, the frequency of an itemset may not be a sufficient indicator of significance, because frequency reflects only the number of transactions in the database that contain that itemset. It does not reveal the utility of an itemset, which can be measured in terms of cost, profit, or other expressions of user preference. On the other hand, frequent itemsets may only contribute a small portion of the overall profit, whereas non-frequent itemsets may contribute a large portion of the profit. In reality, a …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovery of High Utility Itemsets Using Genetic Algorithm with Ranked Mutation

Utility mining is the study of itemset mining from the consideration of utilities. It is the utility-based itemset mining approach to find itemsets conforming to user preferences. Modern research in mining high-utility itemsets (HUI) from the databases faces two major challenges: exponential search space and database-dependent minimum utility threshold. The search space is extremely vast when t...

متن کامل

Mining high on-shelf utility itemsets with negative values from dynamic updated database

Utility mining emerged to overcome the limitations of frequent itemset mining by considering the utility of an item. Utility of an item is based on user’s interest or preference. Recently, temporal data mining has become a core technical data processing technique to deal with changing data. On-shelf utility mining considers on-shelf time period of item and gets the accurate utility values of it...

متن کامل

International Journal of advanced studies in Computer Science and Engineering

Utility mining emerged to overcome the limitations of frequent itemset mining by considering the utility of an item. Utility of an item is based on user’s interest or preference. Recently, temporal data mining has become a core technical data processing technique to deal with changing data. On-shelf utility mining considers on-shelf time period of item and gets the accurate utility values of it...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

Mining High Utility Itemsets from Large Transactions using Efficient Tree Structure

Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. It is an extension of the frequent pattern mining. Although a number of relevant algorithms have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. Such a large number of candidate itemsets ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Applied Mathematics and Computation

دوره 215  شماره 

صفحات  -

تاریخ انتشار 2009